Loop structure of the Internet at the Autonomous System Level 
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We present here a study of the clustering and loops in the graph of Internet at the Autonomous 
Systems level. We show that, even if the whole structure is changing with time, the statistical 
distributions of loops of order 3,4,5 remain stable during the evolution. Moreover we will bring 
evidence that the Internet graphs show characteristic Markovian signatures, since its structure is 
very well described by the two point correlations between the degrees of the vertices. This indeed 
prove that the Internet belong to a class of network in which the two point correlation is sufficient 
to describe all their local (and thus global) structure. Data are also compared to present Internet 
models. 

PACS numbers: : 89.75.Hc, 89.75.Da, 89.75.Fb 



In the last five years the physics community has started 
to look at the Internet as a beautiful example of a 
complex system with many degrees of freedom resulting 
in global scaling properties. The Internet in fact can be 
described as a network, with vertices and edges represent- 
ing respectively Autonomous Systems (AS) and physical 
lines connecting them. Moreover it has been shown 0, Q 
that it belongs to the wide class of scale-free networks 
0, 0] emerging as the underline structure of a variety of 
real complex systems. But, beside the common scale- 
free connectivity distribution, what distinguish networks 
as different as the social networks of interactions and the 
technological networks as for example the Internet? Re- 
searchers have then started to characterize further the 
networks introducing different topological quantities be- 
side the degree distribution exponent. Among those, 
the clustering coefficient C(fc)@ and the average nearest 
neighbor degree k nn (k) of a vertex as a function of its 
degree k |7j,|8(. In particular, measurements in Internet 
yield C(fc) ~ with /x ~ 0.75 @ and k nn ~ k~ v with 
v ~ 0.5 9]. A two- vertex degree anti-correlation has also 
been measured 10] . Accordingly, Internet is said to dis- 
play disassortative mixing |ll| . because nodes prefer to 
be linked to peers with different degree rather than sim- 
ilar. This situation is opposed to that in social networks 
where we observe the so-called assortative mixing. 

Moreover, the modularity of the Internet due to the 
national patterns has been studied by measuring the slow 
decaying modes of a diffusion process defined on it [l^ . 
Recentlv, more attention has been devoted to network 
motifs |l3lll4| . i.e. subgraphs appearing with a frequency 
larger than that observed in maximally random graphs 
with the same degree sequence. Among those, the most 
natural class includes loops pjl ITti Il7l ]l8 | . closed paths 
of various lengths that visit each node only once. Loops 
are interesting because they account for the multiplicity 
of paths between any two nodes. Therefore, they encode 
the redundant information in the network structure. 

In this paper we will present the data of the scaling of 
the loops of length h < 5 in the Internet graph and we will 



show that this scaling is very well reproduced by the two 
points correlation matrix between the degrees of linked 
pair of vertices. This allow us to suggest that the Inter- 
net is "Markovian" , i.e. correlations of order higher than 
two are negligible. In the paper we then study the struc- 
ture of the graph in the two point correlation assumption 
with the goal of characterizing the cycle structure of the 
Internet and defining an upper limit of the scaling of the 
number of loops with the system size valid for all possible 
lengths of the loops. 

To measure the number of loops in an undirected net- 
work we consider its symmetrical adjacency matrix {ciij}, 
with a,ij = 1 if i and j are connected and = other- 
wise. If no loops (self-link in a vertex) are present, i.e. 
an = for all i, the number of loops of length h is given 
by a dominant term of the type Tr&ce(a h ) / h that counts 
the total number of paths of length h minus all the con- 
tributions coming from intersecting paths. For h = 3 
this terms are absent and the total number of loops N3 
of length h = 3 is given by 



*a = il> 8 )< 



(1) 



In the case of shorts loops h < 5 these terms can be easily 
evaluated and give the expressions for the total number 
of loops of size h = 4, 5, N 4 , N 5 |l5j 
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To measure the actual scaling in Internet at the AS level, 
we used Eqs.Q — @- The data of the Internet at the 
Autonomous System level are collected by the University 
of Oregon Route Views Project and made available by 
the NLANR (National Laboratory of Applied Network 
Research). The subset we used in this manuscript are 
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mirrored at COSIN web page |http:/ /www.cosin.org| We 
considered 13 snapshots of the Internet network at the 
AS level at different times starting from November 1997 
(when N = 3015) toward January 2001 (N = 9048). 
Throughout this period, the degree distribution is a 
power-law with a nearly constant exponent 7 ~ 2.22(1). 
Using relations Q, @, we measure Nh(t) for h = 3, 4, 5 
in the Internet at different times, corresponding to dif- 
ferent network size. We observe in figure that the data 
follow a scaling of the type 



N h (N) ~ N iW 



(3) 



with the £(ft) exponents reported in tabled 

To model the Internet means to find a class of net- 
works defined by a stochastic algorithm that share the 
main characteristic of the Internet graph. Consequently 
we suppose that the real Internet graphs belong to a cer- 
tain ensemble of graphs and it is actually a realization of 
it. Supposing one knows this ensemble in order to evalu- 
ate the number of loops one theoretically would need to 
know the entire probability distribution for each element 
of the adjacency matrix, i.e. the probability distribu- 
tion P(au, . . . ai,N, ■ ■ ■ ajv,i • ■ ■ &n,n)- Lets make the as- 
sumption that the probability for a set of ft-nodes to be 
connected depends only on the connectivities. The zero 
order approximation to Eqs.QJ — © would be then to 
assume that the connectivity of the nodes are completely 
uncorrelated and then the formula for calculation of the 
loops of size ft, would be 
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Given a distribution P(k) fc~ 7 with a cutoff at k c — N 1 ^ 
we get the scaling prediction Eq. © with £(ft) = ft(3 — 
7)/x, in the relevant case 2 < 7 < 3. In the special case 
of a uncorrelated graph with 7 = 3 we obtain the scaling 
behavior N h (N) ~ (log(N)f {h) , 

with ip(h) = ft. Interesting enough the same calcula- 
tion is exactly valid also in a Barabasi- Albert [2^ network 
which is a off-equilibrium network but with zero correla- 
tions 0| ■ We need to observe that the fact itself that in 
the Internet data the exponent x follows 
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indicates that the network is strongly correlated, in fact 
for uncorrelated networks we would expect = 1/2 

mm. 

The real exponents £(ft) as expected depend on ft, but 
unfortunately they significantly differ from the zero order 
approximation values £(ft) = ft(3 — j)/x with x given by 
Eq.© for and 7 ~ 2.22 (see table[IJ|. So, the correlation 
nature of the Internet cannot be neglected when one looks 
at the scaling of the loops in the network. 

The first order approximation for Eqs. Q — con- 
sists on taking into account that the connectivity of the 
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FIG. 1: Number of ft-loops JVj, as a function of the system 
size N shown with empty symbols for loops of length 3,4,5 
(circles, squares and diamonds). In solid line we report the 
first order approximation and in dashed line the power-law 
fit to the data. In the Inset we report the logarithm of the 
largest eigenvalue of the matrix C as a function of the system 
size. 



nodes are correlated. In order to calculate the num- 
ber of small loops in the network one can approximate 
Nh ~ Trace(a' l /2ft). In a loop all the nodes are equiva- 
lent, having two links, fixed a direction on the loop one 
link is used to reach the given node an the other link 
to reach the subsequent. The probability that a node 
of degree /ci, already part of the loop, is connected to a 
successive node of degree ki is given by (k,\ — l)P(k 2 \ki) 
since we can decide to follow one of its remaining k\ — 1 
nodes. (In our notations P{k\k') indicates the probabil- 
ity that following one link starting at node k' one reaches 
a node with connectivity k). Consequently, the number 
of loops of size ft in this first order approximation are 
given by 



N, 



(2) 
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Trace(C*) 



h 2ft 
where the matrix C is defined as 



C KW = (k' - l)P(k\k'). 



(6) 



(7) 



Of course for higher order loops it will be not possible to 
neglect the contributions of intersecting paths, but still 
Eq. © would provide an upper limit to the behavior of 
Nh(N). In Fig. n we compare the real data with the 
first order approximation given by Eqs.©- It is clear 
that this approximation capture most of the cycle struc- 
ture, at least for small value of ft. Since we observe this 
peculiar characteristic of the Internet graphs is worth to 
look at the structure of the matrix C. Indeed the ma- 
trix C is characterized by a spectra in which there with 
eigenvalues A which scale as 



X(N) - N e 



(8) 
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FIG. 2: The rescaled spectra of the matrix C calculated over 
the 13 snapshots of the Internet under study. 

where 9 = 0.47 ± 0.01. In Fig. [5] we show how this 
spectra scales for the different snapshots of the Internet 
at the Autonomous System Level. The largest eigenvalue 
A- max (N) is the one of much interest to us in this letter 
since it is responsible for the behavior of Nh at large 
N. Indeed we can estimate an upper limit for the scaling 
of the loops of generic length h with the system size, 
i.e. A^ 2 ' < O (A'f nax /2h) where is scaling is supposed 
to be valid until h <C h* where some arguments support 
the scaling h* ~ N ( 2 T> for random scale- free graphs 
and h* ~ jV 1 ^ 7-1 ) for correlated graphs [lg (see for 
the behavior of the number loops at large h in regular 
random graphs [24]) . In order to fully characterize the 
cycle structure of the Internet is then natural to study 
the structure of the eigenvector associated to the largest 
eigenvalue. For this vector Uk also we observe a scaling 
behavior 

u k = k a f(k/k c ) (9) 

where f(x) = 1 for x -C 1 and f(x) — x@ for x 3> 1, with 
a = -2.50 ± 0.05 and /3 = 3.10 ± 0.05. 

To make a comparison between the real data and the 
model present in the literature at the moment we con- 
sider the Fitness model .;25jand the Generalized Net- 
work Growth Model (GNG) |2q and the Competition and 
Adaptation Model[23 with (D) and without(ND) dis- 
tance constraints. The fitness model has indeed 7 = 
2.255 and the GNG model has a power-law exponent that 
depends on the intrinsic parameter p 7 (p) = 2+p/(2—p). 
In order to compare networks with a similar mean degree 
(< k >E (3.4 — 4.0) [23 for the Internet), we consider the 
fitness model with m = 2 (< k >= 2m = 4) and the 
GNG model with parameter p — 0.5 (< k >= 2/p = 4) 
and p — 0.6 (< k >= 2/p = 3. 33). All models present 
not trivial correlations of the nodes as can be seen by 
observing the C(k) and k nn (k) functions. 

In table [I] we compare the £(h) exponents of the real 
data with the exponents numerically calculated for the 



FIG. 3: The clustering coefficients C3(fe) and 04(h) in Internet 
for the data of November '97 (circles), January '99 (squares) 
and January '01 (triangles). In filled symbols the same results 
obtained in the first approximation assumption. In solid and 
dashed lines we indicate the power-law fit to the data and to 
the first order approximation results respectively. 

considered models. While £(/i) grows almost linearly 
with h as expected we observe that the D and ND models 
seems to best reproduce the data. 



System 


f(3) 


e(4) 


C(5) 


AS 


1.45 ±0.07 


2.07 ±0.01 


2.45 ± 0.08 


ZOA 


2.26 ±0.06 


3.15 ±0.07 


3.94 ±0.09 


FOA 


1.34 ±0.03 


1.86 ± 0.04 


2.25 ±0.05 


Fitness 


0.59 ±0.02 


0.86 ± 0.02 


1.10 ± 0.02 


GNG (p=0.5) 


0.53 ±0.03 


0.72 ± 0.03 


0.96 ±0.02 


GNG (p=0.6) 


0.53 ± 0.03 


0.74 ± 0.03 


0.99 ±0.02 


D 


1.60 ±0.01 


2.20 ±0.03 


2.70 ± 0.03 


ND 


1.59 ±0.03 


2.11 ± 0.03 


2.64 ±0.03 



TABLE I: The exponent £(n) for n — 3, 4, 5 as defined in 
equation for real data,in the zero order approximation 
(ZOA) and in the first order approximation (FOA), and for 
network models. 

Following 0, we also measured the clustering coeffi- 
cients C3 t i and C4_i as a function of the connectivity fej of 
node i for all i's. In particular, 03^ is the usual clustering 
coefficient C, i.e. the number of triangles including node 
i divided by the number of possible triangles fc,-(fcj — 1)/2- 

Similarly, o^j measures the number of quadrilaterals 
passing through node i divided by the number of possible 
quadrilaterals Z;. This last quantity is the sum of all 
possible primary quadrilaterals Zf (where all vertexes are 
nearest neighbors of node i) and all possible secondary 
quadrilaterals Zf (where one of the vertexes is a second 
neighbor of node i). If node i has k" n second neighbors, 
Zf = k t (h - l)(h - 2)/2 and Zf = k? n k t (h - l)/2. In 
Fig. |2| (a) we plot Cs(k), c±(k) for the Internet data at 
three different times (November 1997, January 1999 and 
January 2001) showing that the behavior of Cs(k) and 
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04(h) is invariant with time and scales as 

Ch(fc) ~ k- s ^ (10) 

with (5(3) = 0.7(1) and (5(4) = 1.1(1). 

In Fig. |21 we compare the behavior of 03(h) and 04(h) 
in real Internet data with the first order approximation 
(FOA) results. Again we observe that the first-order ap- 
proximation results are quite satisfactory reinforcing our 
thesis that to explain the loop structure of the Internet it 
is sufficient to stop at this order. However, the behavior 
of 03(h) and 04(h) cannot be explained just looking at the 
largest eigenvalues of the C matrix but one has to con- 
sider the entire spectra. For completeness we also consid- 
ered the behavior of the clustering coefficients 03(h) and 
04(h) in Internet models Table ITU We observe that while 
in the (D) and (ND) models there are large deviations 
form the scaling these models seems in general to cap- 
ture better the cycle structure of the Internet respect to 
the other non ad-hoc models we have considered here. 

In conclusion, we computed the number Nh(t) of h- 
loops of size h = 3, 4, 5 in the Internet at the Autonomous 
System level and the generalized clustering coefficients 
around individual nodes as a function of nodes degrees. 
We have observed that this evolving network has a struc- 
ture of the loops that is well captured by the two point 
correlation matrix. Indeed it seems that the Internet is 
"Markovian" in the sense that is not necessary to study a 
correlations function of more that two points, at least to 
explain the cycle structure. For this reason we have char- 
acterized the correlations matrix Ck,k' = (h 1 — \)P(h\h') 
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5(3) 


5(4) 


AS 
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ND 
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TABLE II: The exponent of the clustering coefficient 03(h) 
and a(k) as measured from Internet data as a result of the 
SOA and from simulations of Internet models. 
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